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Road defect inspection is a crucial task in maintaining a good transportation 
infrastructure as road surface distress can impact user’s comfortability, 
reduce the lifetime of vehicles’ parts, and cause road casualties. In recent 
years, machine learning has been adapted widely in various fields, including 
object detection, thanks to its superior performance and the availability of 


high computing power which is generally needed for its model training. 


Many works have reported using machine-learning-based object detection 
Keywords: algorithms to detect defects, such as cracks in buildings and roads. In this 
work, YOLOv5, YOLOv6 and YOLOv7 models have been implemented 
. , and trained using a custom dataset of road cracks and potholes and their 
Object detection performances have been evaluated and compared. Experiments on the 
Pavement maintenance dataset show that YOLOv7 has the highest performance with mAP @0.5 
Road crack score of 79.0% and an inference speed of 0.47 m for 255 test images. 
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1. INTRODUCTION 

Roads are vital means of transportation in many parts of the world. Various materials are used to 
construct road pavements, including porous asphalt, stone mastic asphalt and gap graded asphalt, among 
others. Asphalt is prone to deficiency due to various factors, like being exposed to water and surrounding 
temperatures, excessive traffic loads, execution mistakes, and lack of maintenance [1]. There are four 
classifications of different types of defects: pavement cracks, surface deformations, disintegrations, and 
surface defects. The size and shape properties of road defects can be used to classify them into different 
categories. They can also be broken down into three severity categories, with mild, moderate, and high 
severity defects being assessed [2]. Knowledge of the different types of road defects can lead to a better 
understanding of the probable causes and treatments for defects [3]. As road pavement serves the purpose of 
having a smooth and comfortable ride and providing surface resistance for safety purposes, any deterioration 
on its surface must be detected in the early stages for rapid treatment. Road distress identification is also 
essential to determine the type of maintenance planning needed. There are three categories of detection 
techniques for road distresses in Malaysia: manual, semi-automatic, and automatic [4]. In recent years, 
machine learning and machine vision have been adopted in various industry sectors. As it has many benefits 
in terms of productivity, efficiency, and flexibility with its usage, various fields of study have applied 
machine learning and machine vision. Despite having various benefit with the developing technologies, some 
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challenges can also be noticed in the implementation of machine vision in road defects detection, such as 
hairline cracks that are difficult to be detected, limitation in detecting cracks edge, as well as lack of cracks 
data quantification for further road maintenance purposes. Recent research on transportation engineering has 
already explored the application of machine learning technology in detecting road pavement deterioration. 
convolutional neural network (CNN), artificial neural network (ANN), K-means cluttering and regression are 
some of the most widely used methods thanks to their excellent performance [5]. 

The main purpose of object detection in road distress inspection is to detect the road defects in the 
images taken from the inspected roads and correctly classify them according to their types. There are many 
promising methods of object detection algorithms that are readily available to be adopted. The foremost 
commonly used approaches are you only look once (YOLO), single-shot detector (SSD) and CNN [6]. CNN 
is one of deep learning algorithms, which aid in parameter identification by separating image into layers so 
that each layer is examined and may be interpreted more precisely than the standard analysis approach [7]. 
Typically, CNN is constructed by incorporating the input, convolutional, pooling, fully connected, and output 
layers. A network with three convolution layers, two fully connected layers, and two neurons at the output 
layer since the number of classes needed are for crack and non-crack output [8]. The CNN developed was 
tested on two different datasets, one obtained from CrackTree200 dataset with an accuracy of 96.99%. At the 
same time, the another was a self-collected dataset with the highest accuracy of 98.8%. Ma et al. [9] tested 
YOLOv3, YOLOv4s-mish, and YOLOv5s models on timber structures cracks, where YOLOv3 was shown to 
have the best performance in terms of precision with the mean average precision (mAP) value of 95.5%, 
while YOLOv5s with mAP value of 92.9% had the fastest training speed because it has the simplest network 
structure. Meanwhile, Yan and Zhang [10] proposed an algorithm of an improved SSD network by adding a 
deformable convolution to the backbone feature extraction in detecting asphalt pavement highway crack, 
resulting with a mAP of 85.11% which is 3.1% higher than the original SSD network. 

Horvat et al. [11] utilized all of YOLOv5 models to detect face mask in images with a relatively 
longest training time of 8.67 hours for the YOLOv5x model while having the best performance of 77.1% 
mAP score. Another YOLOvS based study introduced by Yu [12], a threshold segmentation method based on 
Otsu maximum inter-class variance was adopted to the dataset before being trained on YOLOv5-s model. 
The improved detection achieves 84.37% precision as K-means method has been adapted. Next, Aburaed et 
al. [13] evaluated the performance of YOLOv6 compared to YOLOV5 on detecting craters, where the claims 
that YOLOv6 would outperform YOLOVS still can’t be proven as their performance was inconsistence in 
every scenario. Meanwhile, Yang et al. [14] proposed a three-stage crack location and segmentation method 
where it is first filtered by the Retinex method to remove redundant noise, followed by detection process 
where YOLO-SAMT was introduced, and lastly processed by K-means clustering to extract the cracks. 
YOLO-SAMT is an enhanced algorithm where YOLOv7 architecture is integrated with SimAM and 
transformer, which shows a 5.42% higher mAP score than the original YOLOv7. Meanwhile, road damage 
detection and classification on google street view data using YOLOv7 with a label smoothing technique that 
resulted in higher F1 scores of 81.7% [15]. 

The detection and classification of road defects using object detection algorithms such as YOLOVS, 
YOLOv6, and YOLOv7 face several challenges. Limited availability of high-quality training data, variations 
in lighting, weather conditions, and road surfaces, and the difficulty in accurately distinguishing between 
different types of road defects are some of the critical issues to consider. In this context, the objectives of our 
paper are to evaluate and compare the performance of these algorithms in terms of accuracy, speed, and 
resource usage, investigate the impact of different data augmentation techniques, explore the use of inference 
and fine-tuning to improve the accuracy and assess the potential of these algorithms for real-time road defect 
detection and classification. By addressing these objectives and challenges, this research could contribute to 
improving the effectiveness and efficiency of road defect detection and classification using object detection 
algorithms. 

This paper is structured into 5 main sections. The section 2 provides an overview of the evolution of 
the YOLO object detection algorithm, focusing on the YOLOv5, YOLOv6, and YOLOv7 variations. Section 
3 outlines the methodology used in this study, including data collection and experimental setup. Section 4 
presents the results of the experiments conducted and includes a discussion of these results. Finally, section 5 
offers concluding remarks and summarizes the study's key findings. 


2. EVOLUTION OF YOLO 

YOLO was first introduced in 2015 with the release of “You Only Look Once: Unified, Real-Time 
Object Detection” paper with main purpose to eliminate multistage of training classifier on bounding boxes 
and refining them by only executing a single stage of object detection, while ramping up the inference time 
[16]. Since the release of the first YOLO version, a series of YOLO updated variants has been published by 
few different scholars with each has its own significant upgrades and features. Following the first version, 
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published two more papers with the release of YOLOv2 in 2017 and YOLOv3 in 2018 [16]. Bochkovskiy et 
al. [17] continued the variations with the release of YOLOv4 in 2020 as well as YOLOv7. These four 
versions are established as the official YOLO version, while a lot of other YOLO models such as YOLOR, 
YOLOX, PP- YOLOE, YOLOv5, and YOLOv6 are labelled unofficial as they are published by other 
researchers. Among those, a few have more popularity among end users; for example, YOLOVS, published in 
2020 by Ultralytics and YOLOv6, released by Meituan Inc in 2021 has comparatively higher performance 
with its anchor-free method. Few past researches are also published in analysing the performance of YOLO 
models. Jiang et al. [18] compared the differences and relationship of YOLOv1 until YOLOv5 architecture 
and relativity, where YOLOv4 and YOLOVS having similar and the highest performance in terms of speed 
and accuracy at that time. Thuan [19] in his article also concur to the comparison, while expecting more 
performance value of YOLOvS as it was newly released at that time. In this paper, the three versions of 
YOLO; YOLOv5, YOLOv6 and YOLOv7 models, are adapted to compare their performance on road cracks 
and potholes detection and classification. 

YOLO was initially developed to use bounding boxes with a corresponding threshold value to 
precisely detect objects on images using a model grid cell. YOLOv! architecture started with the design of 
Darknet architecture with 24 convolutional layers followed by two fully connected layers inspired by 
GoogleNet [16]. In the process of improving the algorithm, YOLOv2 was invented with the addition of batch 
normalization and higher resolution input, as well as replacing the fully connected layers with anchors boxes, 
which improved the recall by 7% and mAP by 2% [20]. The model is then being developed more with the 
creation of YOLOv3 with a more powerful backbone, DarkNet-53, with 53 convolutional layers. It 
eliminates the usage of softmax classifiers, which limits the overlapping boxes, and adopts a logistic 
regression [21]. Bochkovskiy et al. [17] design the enhanced YOLOvV4 architecture with the new backbone, 
combination of cross stage partial network (CSPNet) and Darknet, CSPDarkNet-53, consists of 29 
convolutional layers with the addition of spatial pyramid pooling (SPP) block, as well as mosaic data 
augmentation that uses 4-image mosaic instead of 1 image during training. 


2.1. YOLOv5, YOLOv6 and YOLOv7 algorithms 

Similar to YOLOv4, YOLOv5 uses CSPDarkNet-53 as its architecture backbone, path aggregation 
network (PANet) as the neck to improve the effectiveness of data transfer inside the model, and with the 
addition of a focus layer that replaces the YOLOv3’s head layers. However, the developer, Ultralytics, has 
not released any paper on the model. Even though there are only a few improvements in YOLOv5 
architecture compared to YOLOV4, it is the first ever model that implemented PyTorch instead of DarkNet, 
where PyTorch framework is more user-friendly with language that is widely use in current machine learning 
technology. Furthermore, with the implementation on the focus technique, YOLOv5 models are 90% smaller 
than YOLOv4, thus marks a much faster training speed without impacting the mAP score [22]. Figure 1 
presents the overall network architecture of YOLOvS5 where it consists of three main parts: CSP-Darknet as 
the backbone, PA-Net as neck, and YOLO layer for the head. CSP-Darknet is a cross stage partial network 
strategy that is used to help in minimising the excessive amount of duplicate gradient information from usage 
of residual blocks. This strategy makes YOLOv5 having a faster inference speed due to a smaller number of 
parameters and computation used. PANet is a feature pyramid network that is utilized in the neck part where 
it improves in pixels localization. The head of the network for YOLOv5 is similar to YOLOv3 and YOLOv4 
where it consists of three convolutional layers that is crucial in calculating the bounding boxes coordinates. 

In 2021, Meituan Inc published YOLOv6, designed mainly for industrial applications purposes, also 
written in PyTorch, is anchor free, and has a reparametrized backbone called EfficientRep where RepVGG is 
used for nano and small models, while CSPStackRep is used for medium and large models. The neck 
structure is similar to YOLOvS with a bi-directional concatenation (BiC) for more localization accuracy, with 
a decoupled classification and detection head. Overall, YOLOv6 delivers a better result than the former 
versions in terms of its accuracy and is 51% faster compared to previous anchor-based models [23]. Figure 2 
represents the overall network architecture of YOLOv6 [23]. 

YOLOv7 was released with the publication of the paper, entitled “Trained bag-of-freebies sets new 
state of the art for real-time object detectors,” which revealed a new change of the model architecture by 
integrating the extended efficient layer aggregation network (E-ELAN) by grouping computational blocks 
while not changing the transition layers. The architecture is also scaled by concatenating the previous YOLO 
models for the purpose of inference speed adjustments, as seen in Figure 3. The overall improved architecture 
of YOLOv7 gives an increasing detection accuracy as well as speed [24]. 

The overall comparison of the development of YOLO architecture from YOLOv5 up to YOLOv7 
can be observed in Table 1. Meanwhile, Figure 4 represents the average precision (AP) curve of YOLO 
models, where YOLOv7 achieved the highest performance in terms of speed as well as precision [24]. The 
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following sections present the methodology of implementing the three selected YOLO models; YOLOv5, 
YOLOv6 and YOLOv7 in detecting and classifying road defects. 
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Figure 1. The network architecture of YOLOVS. It consists of three parts: backbone: CSP-darknet, neck: PA- 
Net, and head: YOLO layer [25] 
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Figure 2. YOLOv6 model architecture [23] 
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Figure 3. Compound scaling up depth and width for concatenation-based model [24] 


Table 1. Architecture structure comparison of YOLOv5, YOLOv6 and YOLOv7 


Layers YOLOv5 YOLOv6 YOLOv7 
Backbone CSPDarknet-53 RepVGG and CSPRepStack E-ELAN 
Neck PANet RepPAN PANet 
Head 3 convolutional layers combined Decoupled classification and Lead head and auxiliary head 
with ProtoNet detection head 
Loss function binary cross entropy and logit loss Varifocal loss and distribution focal BCE with focal loss and IoU loss 
function loss 
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Figure 4. Comparison of YOLO models performance based on AP curve 


3. METHODOLOGY 
3.1. Data acquisition and pre-processing 

The images used in this work were acquired using a GoPro Hero 8 camera mounted behind a car, as 
illustrated in Figure 5. GoPro Hero 8 offers advantageous features such as image stabilization, lightweight, 
high-resolution image produced, and practicality. A good image stabilization helps as the camera was 
mounted on a moving car. GoPro Hero 8 is also practical to be mounted on a car since it is light with only 
117g weight and small. Its dimension is 6.2x3.2x4.5 cm. For the data collection, the camera was set to video 
mode with a 1920x1080 pixels resolution at 24 fps. A linear digital lens was chosen to minimise the barrel 
effect. The camera was set at a 160 cm height to allow it to capture the road surface at a width of 3.1 m, 
considered the largest typical width of a road. 


Figure 5. Camera setup on vehicle for data acquisition 


Videos of the road were captured with format of mp4 for the duration of 5 to 10 minutes at a 
maximum speed of around 30 km/h. Images were extracted and saved from the videos in jpg format with a 
resolution of 1920x1080 pixels. A total of 8396 images were extracted from all the videos acquired during 
data collection, and after manually filtering out images without any visible road defects, 3328 images 
remained. 

Roboflow was chosen as the primary tool to annotate the images, split them, then to augment them. 
The images were annotated manually using the bounding box features. The annotated defects were split into 
four classes which are crocodile cracks, longitudinal cracks, transverse cracks, and potholes. The image 
dataset was then split into train, validation and test sets at the ratio of 7:2:1. The images were then augmented 
by flipping them in both vertical and horizontal axis resulting in a total dataset of 4788 images split into 4000 
training images, 533 validation images and 255 test images, with final image resolution of 640x360 pixels. 

The defects to be detected from the images were classified into four classes: crocodile crack, 
longitudinal crack, transverse crack, and potholes. The sample of the images containing these four classes can 
be seen in Figure 6(a) for crocodile crack, longitudinal crack as in Figure 6(b), potholes as in Figure 6(c), and 
lastly transverse crack as in Figure 6(d). 
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Figure 6. Image samples of road defects captured for each class; (a) crocodile crack, (b) longitudinal 
crack, (c) pothole, and (d) transverse crack 


3.2. Deployment of YOLO models for road crack detection 

YOLO models have been chosen because of their proven fast inference speeds and high accuracies. In 
this work, the performance of YOLOv5, YOLOv6 and YOLOv7 models in road crack detection have been 
evaluated and compared. The YOLOv5, YOLOv6 and YOLOv7 models were obtained from 
github.com/ultralytics/yolov5, github.com/meituan/yolov6 and github.com/WongKinYiu/yolov7, respectively. 
They were trained using the prepared dataset described in the previous section. Google Colab was used for 
training the models, which offers high-performance GPUs. Roboflow was used to annotate the images, augment 
selected images, and create the configuration files for model training purposes. The training for each model was 
completed after 100 epochs. Finally, the inference was also done in Google Colab, although it could have been 
done locally on a typical laptop and does not require a high processing power. To find the best performing 
model in terms of both speed and accuracy, many models of YOLO architectures were investigated, which 
include YOLOvS-n (nano), YOLOv5-s (small), YOLOv5-m (medium), YOLOvS-1 (large), YOLOv5-x (extra- 
large), YOLOv6-n, YOLOv6-s, YOLOv6-m, YOLOv6-1, YOLOv7-tiny, YOLOv7 and YOLOv7-x. 

The results obtained from each run were evaluated in terms of precision and accuracy. At the end of each 
training run, the results were saved, and they include precision, recall, mAP and its mAP at different IoU 
thresholds ranging from 0.5 to 0.95. The main parameters that need to be focused on are accuracy and mAP@0.5, 
which is the mean average precision. Meanwhile, as the accuracy result is not included in the data results, it must 
be calculated using each training run's confusion matrix. The calculations for each of the results are as in (1)-(5): 


Precision = —=— (1) 
TP+FP 
Recall = —" (2) 
TP+FN 
AP = = X} Precision(Recall) (3) 
mAP = E X AP (4) 
TP+TN 
Accuracy = TP+TN+FP+FN (5) 


Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, and AP is average 
precision. 


4. RESULTS AND DISCUSSION 

Table 2 shows the performance results of all the models trained in this work. Since all models were 
deployed using the same instances and dataset for each run, the results can be analysed comparatively. It can 
be seen that YOLOv7-tiny has the shortest training time despite being in a bigger class range compared to 
YOLOv5-s and YOLOv6-s. To compare relatively each model to their respective size, YOLOv7 still has an 
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overall shortest training time among all. Regarding mAP value, YOLOv7 has the highest score of 79.0%. 
However, YOLOv5-1 only lack of 0.01% while having a shorter training time by almost 1-hour difference. 
YOLOv7 model also records the highest accuracy with 87.16%. Among YOLOv5 models, YOLOv5-1 sets 
the highest performance with 78.9% mAP score and 85.65% accuracy, while for YOLOv6 models, YOLOv6- 
l take place with a mAP value of 72.32% with a higher accuracy of 86.9%. 


Table 2. Training performance results for YOLOv5, YOLOv6 and YOLOvw7 models 


Model Training time (hr) mAP@0.5(%) Accuracy (%) 
YOLOv5-n 3.71 74.50 86.19 
YOLOv6-n 4.47 66.66 84.12 

YOLOv/7-tiny 2.99 74.80 86.05 
YOLOv5-s 4.42 77.00 85.55 
YOLOv6-s 5.25 68.11 84.79 
YOLOv5-m 4.67 78.40 86.21 
YOLOv6-m 6.50 64.86 85.50 

YOLOv7 5.78 79.00 87.16 
YOLOv5-1 4.92 78.90 85.65 
YOLOv6-1 8.33 72.32 86.90 
YOLOv5-x 5.84 78.30 85.10 
YOLOv7-x 9.25 76.30 86.05 


Even though based on the evaluation, YOLOvS5-1 model has a higher mAP of 78.9% compared to 
YOLOv5-x, 78.3% as can be seen in Figure 7(a), it can also be observed that YOLOv5-x model has the best 
performing parameter compared to the other models as it maintains the highest curve throughout the whole 
run. Meanwhile, from Figure 7(b), YOLOv6-1 exhibits the best performance out of the four models. Lastly, 
YOLOv-7 and YOLOv/7-x increase with a similar performance throughout the run while YOLOv-7 
outperforms the other on the last few epochs, as shown in Figure 7(c). Figure 8 compares all 3 best models of 
their respective YOLO algorithms. It can be observed that YOLOv5-1 training run has a rapid increase of 
mAP with the number of epochs in the initial phase compared to YOLOv7, but YOLOv7 outperforms 
YOLOv5-1 towards the final phase of training. Thus, YOLOv5-1, YOLOv6-1 and YOLOv7 are chosen as the 
best models for each respective YOLO version. 
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Figure 7. mAP @0.5 curves for models of (a)YOLOv5, (b)YOLOv6, and (c)YOLOv7 
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Figure 8. Comparison of mAP@0.5 curves for best models from YOLOv5, YOLOv6, and YOLOv7 


To evaluate the results, the best models obtained in the training run, the best models were tested 
further by inferencing other 255 test images to validate the YOLOvS5-1, YOLOv6-1 and YOLOv7 best 
models. The speed of the inference run for all best models are recorded in Table 3, with YOLOv7 shown to 
have the fastest speed. 


Table 3. Testing speed for inferencing 255 test images using YOLOv5, YOLOv6, and YOLOv7 best model 


Model Inference time (minutes) 
YOLOv5-1 0.97 
YOLOv6-1 1.68 

YOLOv7 0.47 


Four sample result images of each best models were compared based on the detection of the crack 
classes. The confidence score is displayed on the bounding boxes to analyse the models' inference 
performances, besides the accuracy of identified cracks to their labels. Figures 9 to 12 display the sample 
inferred images on different type of cracks detected. Figures 9(a) to (c) show the comparison of the 
confidence score of YOLOv5-1, YOLOv6-1 and YOLOv7 in detecting an obvious crocodile crack, where all 
models give a same high score of 0.98. Figures 10(a) to (c) discussed on the accuracy of detecting multiple 
cracks on one image and it shows that YOLOv5-1 manages to detect the second longitudinal crack that the 
other 2 models have not detected, as well as having a comparatively higher scores for longitudinal crack and 
pothole detected. Meanwhile, Figures 11(a) to (c) compares the images with combination of crocodile and 
transverse cracks which show that the best result is from model YOLOv5-1 and YOLOv7 where they have a 
similar confidence score, with YOLOvS5 having a 0.07 score higher in detecting transverse crack. While 
having a rather lower confidence score in detecting the cracks among all models, YOLOv6-1 unexpectedly 
detected the transverse crack, as shown in Figure 12(b), where the other two models did not detect the 
obscure cracks at all as seen in Figures 12(a) and (c). From this comparison, it can be concluded that even 
though YOLOvS-I and YOLOv7 has a very similar performance in inferencing the images, YOLOvS has the 
upper hand in the confidence score. 


Crocodile 0.98 


(a) (b) (c) 
Figure 9. Inference test results on image with a crocodile crack using; (a) YOLOv5-1, (b) YOLOv6-1, and 
(c) YOLOv7 
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ar 
Figure 10. Inference test results on image with a combination of crocodile crack, longitudinal crack and a 
pothole using; (a) YOLOv5-1, (b) YOLOv6-1, and (c) YOLOv7 


Crocodile 0.91 


(a) (b) (c) 


Figure 11. Inference test results on image with a combination of crocodile crack and transverse crack using; 
(a) YOLOv5-I, (b) YOLOv6-1, and (c) YOLOv7 


(a) (b) (c) 


Figure 12. Inference test results on image with an obscure transverse crack using; (a) YOLOv5-1, 
(b) YOLOv6-1, and (c) YOLOv7 


5. CONCLUSION 

This paper evaluated the performance of three YOLO models, which are YOLOv5, YOLOv6 and 
YOLOv7, in detecting and classifying road defects. It was observed that model YOLOvS-1 and YOLOv7 
have the best implementation among all the 12 models assessed, with a very similar performance. In terms of 
training execution over a training dataset of 4000 images, YOLOvS had a training time of 4.92 h, while 
YOLOv7 trained for 5.7 h, and they evaluated mAP@0.5 score of 78.9% and 79.0% respectively. This shows 
that YOLOvS has an upper hand in terms of training performance, as they both resulted a similar precision. 
In the matter of inferencing process to detect the cracks, YOLOvS has an inferencing speed of 0.97 minute 
while YOLOv7 records the speed of 0.47 minute for a total of 255 test images dataset, while they were 
evaluated with comparison of confidence score where YOLOvS has higher points. It shows that even though 
YOLOv7 can perform the inference process at two times faster speed compared to YOLOvS, in terms of 
accuracy and precision of the detected cracks YOLOvS still has the advantages. Nonetheless, due to the 
resource limitations, such as restricting the training run to only 100 epochs and utilizing a dataset comprising 
only 640 x 360 resolution images and the total images work on was less than 5000, the results were confined 
to a single discrepancy. To improve upon these findings, future research could entail working on expanded 
YOLO models and using higher resolution images in conjunction with a variation of epochs number training 
run. Furthermore, potential pre-processing steps could be implemented on the dataset, and the difference in 
the dataset inference on images with varying lighting could also be explored. 
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